BE-457: HashQL: MIR execution pipeline extensions for postgres compilation#8525
BE-457: HashQL: MIR execution pipeline extensions for postgres compilation#8525indietyp wants to merge 4 commits into
Conversation
feat: checkpoint (II) feat: checkpoint (III) feat: snapshot vec feat: add dedicated filter feat: checkpoint feat: filter implementation feat: filter implementation (mostly) done chore: environment capture note chore: always postgres bigint feat: target clone feat: simplify lookup feat: move storage up feat: eval entity path chore: checkpoint chore: checkpoint chore: find entrypoint feat: eval context feat: eval cleanup chore: cleanup feat: track index feat: wire up filter feat: add error reporting chore: checkpoint feat: add traverse, and first postgres compiler outline feat: traverse bitmap feat: move traversal out feat: projections feat: projections fix: clippy feat: subquery projection for lateral feat: checkpoint feat: test plan feat: checkpoint feat: checkpoint – failing tests ;-; feat: checkpoint – failing tests ;-; feat: checkpoint — passing tests fix: import fix: entity type feat: checkpoint feat: attribute a cost to terminator placement switches fix: import feat: checkpoint feat: checkpoint chore: lint
PR SummaryMedium Risk Overview Upgrades island/traversal plumbing for SQL generation: Changes placement behavior by introducing a fixed backend switch overhead in terminator transition costs (via Reviewed by Cursor Bugbot for commit 39f364e. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
🤖 Augment PR SummarySummary: Prepares the HashQL MIR execution pipeline for Postgres compilation by returning a full island dependency graph and exposing traversal/placement APIs needed by the SQL generator.
entity_uuid_equality).
Notes: Backend switch costs are heuristics intended to be refined by a measured cost model.
🤖 Was this summary useful? React with 👍 or 👎 |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8525 +/- ##
========================================
Coverage 62.31% 62.31%
========================================
Files 1354 1354
Lines 137003 137313 +310
Branches 5792 5793 +1
========================================
+ Hits 85372 85567 +195
- Misses 50725 50839 +114
- Partials 906 907 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Merging this PR will not alter performance
Comparing Footnotes
|
6ef6d13 to
ef9b858
Compare
5d95ba7 to
1aa0f1c
Compare
ef9b858 to
a91293c
Compare
1aa0f1c to
87024df
Compare
3050d4a to
5552cbb
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 5552cbb. Configure here.
5552cbb to
60da845
Compare
60da845 to
fd9a3a8
Compare
4f33a35 to
959fba1
Compare
fd9a3a8 to
44ba0c2
Compare
44ba0c2 to
39f364e
Compare
thehabbos007
left a comment
There was a problem hiding this comment.
Looks good nothing stood out to me that needed to change
| } | ||
|
|
||
| impl<'alloc, A: Allocator, S: BumpAllocator> PlacementSolver<'_, 'alloc, A, S> { | ||
| impl<'alloc, S1: Allocator, S: BumpAllocator> PlacementSolver<'_, 'alloc, S1, S> { |
There was a problem hiding this comment.
small nit: I suppose it is slighly confusing to me that we name the allocator S1 and bump allocator S. Is there a reason why this naming is used instead of A/B or other separate letters? I suppose S' would have been slightly nicer but S1 does fill that wish :)
There was a problem hiding this comment.
I would've called it S' (but we live in a world where that isn't a thing in Rust :P) I work through this upstream, in general there should only ever be two allocators present: A, the allocator that we deposit into, and S, the scratch that isn't long-term persistent. That's it (hence the naming here). I believe that I reconcile this properly later. The issue becomes one we have the A inside of the function, instead of the trait because then A can be anything, and we create three different generics instead.


🌟 What is the purpose of this PR?
Prepares the MIR execution pipeline for consumption by the postgres compiler. The execution analysis now produces a complete
IslandGraph(not just a flat island list), the placement solver and island placement accept external allocators, and the traversal system gains theTraversalPathBitMapandas_symbol()APIs that the SQL generator needs. Also adds a backend switch cost to the terminator placement so cross-backend transitions are no longer free.🔍 What does this change?
Execution analysis (
pass/execution/mod.rs):run()becomesrun_in()with an explicit allocator parameter, returningExecutionAnalysisResidual(assignment + island graph). The island graph is now constructed as part of the analysis rather than left to the caller.run_all_in()which runs the execution analysis over all graph-read bodies in aDefIdSlice.Traversal system (
traversal/mod.rs,traversal/entity.rs):TraversalPathBitMap: per-vertex-type collection ofTraversalPathBitSets with pointwise lattice operations. The postgres compiler uses this to track which paths each island accesses across all vertex types.TraversalPath::as_symbol(): returns a static symbol for each path variant, used as SQL column aliases so the interpreter can locate result columns by name.TraversalPathBitSet::vertex(): returns the vertex type for a bitset.EntityPath::as_symbol()andEntityPath::column_name(): per-path SQL identifiers.Terminator placement (
terminator_placement/mod.rs):TransMatrixgainsAddAssignfor element-wise saturating addition.backend_switch_cost()which encodes a fixed overhead for cross-backend transitions (Postgres to Interpreter: 8, Interpreter to Embedding: 4, etc.). Previously cross-backend transitions had zero inherent cost, so empty blocks were arbitrarily assigned to the interpreter even when staying on postgres was free.Placement solver (
placement/solve/):PlacementSolver::run()becomesrun_in()with allocator parameter.Island graph (
island/graph/mod.rs):IslandGraph::new_in()now takes an allocator for its output storage.Pretty printer (
pretty/text.rs):TextFormatAnnotationsgainsannotate_basic_block()andBasicBlockAnnotationassociated type, plus a blanket impl for&mut T.Builder (
builder/rvalue.rs):RValueBuilder::opaque_entity_uuid()convenience constructor for the common pattern of extracting an entity's UUID.Pre-Merge Checklist 🚀
🚢 Has this modified a publishable library?
This PR:
📜 Does this require a change to the docs?
The changes in this PR:
🕸️ Does this require a change to the Turbo Graph?
The changes in this PR:
The
backend_switch_cost()values (8, 4, etc.) are hand-tuned heuristics. They correctly prevent the solver from arbitrarily switching backends for empty blocks, but a proper cost model would derive these from measured overhead.🛡 What tests cover this?
execution/tests.rs) includingentity_uuid_equality,mixed_postgres_embedding_interpreter,projection_and_apply_splitseq_opaque_entity_uuid)❓ How to test this?